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O These are really simple tricks. 

• You can do much cooler stuff with entropy etc. 
NOT IDS/IPS stuff: 

• learning the "normal" values, patterns 

• statistical training — > black box 

• once trained, hard to understand or tweak 

O Not a survey of research literature (but see last slides). 



What can they do for us in everyday log browsing? 
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• The lines you need are only 
20 PgDns away: 

• ...each one surrounded by 
a page of chaff... 

• ...in a twisty maze of 
messages, all alike... 

• ...but slightly different, in 
ways you don't expect. 
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Uncertainty 



Sorting, grouping & filtering: 

• Shows max and min values in a field 

• Groups together packets with the 
same value 



the overall data? 




Sorting, grouping & filtering: 



Drills down to an "interesting" group 



the overall data? 




Sorting, grouping & filtering: 

• Shows max and min values in a field 

• Groups together packets with the 
same value 

• Drills down to an "interesting" group 



O Where to start? Which column or protocol feature to pick? 
O How to group? Which grouping helps best to understand 

the overall data? 
O How to automate guessing (1) and (2)? 



• Most lines in a large log will not be examined directly, ever. 

• One just needs to convince oneself that he's seen 
everything interesting. 

• Zero in on "interesting stuff", must fold away and ignore the 
rest. 



Must deal with uncertainty about the rest of the log. 



There is a measure of uncertainty: entropy. 



"Look at most frequent and least frequent values" in a sorted 
column or list. 



Q Simplicity — > less uncertainty — > smaller ent 
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"Look at most frequent and least frequent values" in a sorted 
column or list. 

• Which column to start with? 

• What if there are many columns and many batches of 
data? 

It would be nice to begin with "more promising" columns or 
features. 
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"Look at most frequent and least frequent values" in a sorted 
column or list. 



• Which column to start with? 

• What if there are many columns and many batches of 
data? 

It would be nice to begin with "more promising" columns or 
features. 



O Start with a data summary based on the columns with 

simplest value frequency charts (histograms). 
Q Simplicity — ► less uncertainty — > smaller entropy. 



"When most groups in the data look similar in some projection, 
but a few do not, look at those more closely". 
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"When most groups in the data look similar in some projection, 
but a few do not, look at those more closely". 



• Which grouping to start with? 

• Which feature pairs/tuples? 
Too many to try by hand. 



O Try and rank projections before looking, and look at the 

simpler correlations first. 
O Simplicity — ► stronger correlation between features — ► 

smaller conditional entropy. 
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dst_port (sorted by frequency) 



Count of packets in the "dst port= 


= 445" bin 
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dst port (sorted by frequency) 



Let a random variable X take values x^ , x 2 , . 
probabilities pi,p 2 ,. . . ,Pk- 



. , x k with 
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The entropy of X is 
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log 2 l 
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'/ is Pi = 
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for all 


/ = 1 , . . . , k. 













O Entropy measures the uncertainty or lack of information 

about the values of a variable. 
O Entropy is related to the number of bits needed to encode 

the missing information (to full certainty). 



The least number of bits needed to encode numbers between 1 
and N is log 2 N. 



• You are to receive one of N objects, equally likely to be 
chosen. 

• What is the measure of your uncertainty? 



The number of bits needed to communicate the number of the 
object (and thus remove all uncertainty), i.e. log 2 N. 

If some object is more likely to be picked than others, 
uncertainty decreases. 




Entropy is a measure of uncertainty about the 
value of X 



O X = (.25 .25 .25 .25) : H(X) = 2 (bits) 



O X = (.8 .1 .05 .05) : H(X) = 1 .022 
O X = (1 0) : H(X) = 
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OX = (.25 .25 .25 .25) : H{X) = 2 (bits) 
Q X = (.5 .3 .1 .1): H(X) = 1.685 
©X = (.8 .1 .05 .05) : H(X) = 1 .022 

O X=(1 0): H(X) = 

For only one value, the entropy is 0. 

When all N values have the same frequency, 

the entropy is maximal, \og 2 N. 
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"When most groups in the data look similar in some projection, 
but a few do not, look at those more closely". 



• Which grouping to start with? 

• Which feature pairs/tuples? 
Too many to try by hand. 



O Try and rank projections before looking, and look at the 

simpler correlations first. 
O Simplicity — ► stronger correlation between features — ► 

smaller conditional entropy. 
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Conditional entropy of Y given X 

H{Y\X) = H(X, Y) - H(X) I 

I.e.: how much uncertainty about V remains 
once we know X. 










Conditional entropy of Y given X 

H{Y\X) = H(X, Y) - H(X) I 

I.e.: how much uncertainty about V remains 
once we know X. 



Mutual information of two variables X and Y 

l(X- Y) = H{X) + H(Y)- H{X, Y) 

H(y|x>y Reduction in the uncertainty about X once 
we know Y and vice versa. 
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File Edit View belp 






S d™" srcjp: 55 dstjp: 75 


,,,: * .,,,- T 




m. u dst 


port: 80 srcjp: 8 dstjp: 30 a rc_port: 63 


id 10C+ C5b80313 




BM dst 


port: 21 srcjp: (80.141. 141. 173) dstjp: 11 src_port: 11 


mrtl Ad- 




BB dst 


port: 4899 srcjp: (218.103.195.242) dstjp: 22 B rc_port: 22 


prDqro-r, s-10-: 




BO dst 


port: 4000 srcjp: 2 dstjp: 8 src_port: 15 


rjeid 732c5cd3; 




[HI dst 


port: 443 srcjp: (211.5.239.5) dstjp: 9 srojport: 9 


tncs:cilDC+ Fr :iA=r 




US] dst 


port: 139 srcjp: (129.170.125.243) dstjp: 8 srcjport: 8 


_yoor 2003 




[ia dst 


port: 1524 srcjp: (192.139.15.34) dstjp: 12 src_port: (1524) 


dst ip 75 -75.170 1 




[») dst 


port:l srojp: (209.15.84.72) dstjp: 9 sro_port: 9 


drtjort k i.ji.as. 




[3.!] dst 


port: 8100 src_ 


: (194.208.40.120) dstjp: 2 src_port 
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f 35s .~~. 3 . 




».B dst 


port: 8000 src_ 


: (194.208,40.120) dstjp: 2 srcjort 
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loghost annon 




[3. a dst 


port: 8080 src_ 


: (194.208.40.120) dstjp: 2 src_port 
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mark pos 




[J. a dst 


port: 3128 src_ 


: (194.208.40.120) dstjp: 2 srcjort 
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program snort 
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port: 1080 src_ 


: (194.208.40.120) dstjp: 2 srcjort 
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Information theory provides useful heuristics for: 

• summarizing log data in medium size batches, 

• choosing data views that show off interesting features of a 
particular batch, 

• finding good starting points for analysis. 
Helpful even with simplest data organization tricks. 



<{X).H{X\Y)J{X;Y).. 
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Information theory provides useful heuristics for: 

• summarizing log data in medium size batches, 

• choosing data views that show off interesting features of a 
particular batch, 

• finding good starting points for analysis. 
Helpful even with simplest data organization tricks. 



parts of a complete analysis kit! 



H{X),H(X\Y),I{X;Y),. 
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Research on using entropy and related measures for network 
anomaly detection: 

• Information-Theoretic Measures for Anomaly Detection, 
Wenke Lee & Dong Xiang, 2001 

• Characterization of network-wide anomalies in traffic flows, 
Anukool Lakhina, Mark Crovella & Christiphe Diot, 2004 

• Detecting Anomalies in Network Traffic Using Maximum 
Entropy Estimation, Yu Gu, Andrew McCallum & Don 
Towsley, 2005 
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For source code (GPL), documentation, and technical reports: 
http://kerf.cs.dartmouth.edu 



